Improving pathway prediction accuracy of constraints-based metabolic network models by treating enzymes as microcompartments

Metabolic network models have become increasingly precise and accurate as the most widespread and practical digital representations of living cells. The prediction functions were significantly expanded by integrating cellular resources and abiotic constraints in recent years. However, if unreasonable modeling methods were adopted due to a lack of consideration of biological knowledge, the conflicts between stoichiometric and other constraints, such as thermodynamic feasibility and enzyme resource availability, would lead to distorted predictions. In this work, we investigated a prediction anomaly of EcoETM, a constraints-based metabolic network model, and introduced the idea of enzyme compartmentalization into the analysis process. Through rational combination of reactions, we avoid the false prediction of pathway feasibility caused by the unrealistic assumption of free intermediate metabolites. This allowed us to correct the pathway structures of l-serine and l-tryptophan. A specific analysis explains the application method of the EcoETM-like model and demonstrates its potential and value in correcting the prediction results in pathway structure by resolving the conflict between different constraints and incorporating the evolved roles of enzymes as reaction compartments. Notably, this work also reveals the trade-off between product yield and thermodynamic feasibility. Our work is of great value for the structural improvement of constraints-based models.


Introduction
With the enhancement of parameter acquisition ability and the improvement of functional annotation, the available data has experienced a blowout growth [1][2][3][4][5].For the construction of genome-scale metabolic models (GEMs), the most instructive digital method of cell metabolic processes was developed more than 20 years ago [6,7], and the desire for refinement is becoming more ambitious than ever before.The automated construction process [8][9][10] and quality control methods [11] of metabolic network models have gradually become mature [12].In recent years, enzyme-constrained models were developed continuously, and automated processes developed based on the MOMENT principle [13], such as GECKO [14], sMOMENT [15] and ECMpy [16], have been applied rapidly [17].At the same time, enzyme-constrained models have been successfully integrated with thermodynamic constraints, such as ETFL (metabolism and expression model framework with thermodynamic constraints) [18] and ETGEMs (GEMs integrating enzymatic and thermodynamic constraints) [19].
During ETGEM development, we realized that due to the separation of some chemical reactions in the iML1515 stoichiometric framework [20], some prediction results were inconsistent with reported facts after the integration of thermodynamic constraints.For example, the combination of PGK_r, PGCD, GAPD, FBA and TPI reactions (catalyzed by phosphoglycerate kinase, phosphoglycerate dehydrogenase, glyceraldehyde 3-phosphate dehydrogenase, fructose-bisphosphate aldolase and triose-phosphate isomerase, respectively) was mistakenly considered to be thermodynamic infeasible [19].Therefore, it is time to split and merge GEM reactions according to the real intracellular situation.We also realized that if the biochemical reactions were excessively split, the correct understanding of remarkable adaptive strategies that cells have evolved, such as enzyme complexes, fusion enzymes and multifunctional enzymes, would be missed [21][22][23].Conversely, if we take the initiative to disassemble the links between reactions, we may obtain new possibilities in pathway diversity, broaden the solution space, and develop new compounds synthesis method.
The concept of compartmentalization is frequently mentioned both in model construction and practical research.Prokaryotic GEMs are generally divided into two or three regions including cytoplasmic, extracellular and periplasmic compartments, whereby the latter is only present in Gram-negative bacteria, while conditional connectivity is achieved through transport and exchange reactions.In GEMs of eukaryotes such as of Saccharomyces cerevisiae and Yarrowia lipolytica, here are more than ten independent cell divisions such as mitochondria and other organelles [24].The compartmentalization of engineered metabolic pathways in yeast mitochondria notably improved the production of branched-chain alcohols [25][26][27].Dating back further, the concept that enzymes and their microenvironment should be considered as compartments was proposed [28].Unfortunately, such a microscopic compartmentalization concept has not been effectively and clearly absorbed in the construction and application of metabolic network models or their extended versions.
In this work, we used the model EcoETM [19], an extended iML1515 by integration of thermodynamic and enzymatic constraints.By analyzing the synthetic pathways of L-serine and L-tryptophan, we revealed the necessity of considering the evolutionary strategy of multifunctional enzymes and enzyme complexes as metabolic compartments to effectively avoid unreasonable conflicts between different constraint levels in the extended GEMs.Finally, we demonstrated the application method and value of the multiple-constraint GEMs in the analysis of product synthesis pathways and improving the basic framework quality of GEMs.

Fundamental tools for analysis
All simulations were conducted using Jupyter Notebook with ETGEMs, a Python-based tool for constructing multi-constrained metabolic network models, which adopts the Cobrapy toolbox [29] and Pyomo software package [30].The construction principles and implementable functions of ETGEMs are described in the method section of our previous paper [19].
The code files, input files, and result files employed in this study are available for access at https://github.com/tibbdc/ETGEMs/tree/master/EnzCompart.

Analyzing bottleneck reactions
The bottleneck reaction [37] represents the step with the worst thermodynamic feasibility within a pathway, and maximizing its thermodynamic driving force (MDF) serves as the objective function in the OptMDFpathway method [33].The analysis of bottleneck reactions mainly involves several steps: 1) The lower limit of the flux for purpose product synthesis is set, and then the achievable MDF for the pathway is calculated.
2) Subsequently, the obtained MDF value is utilized as the constraint's lower bound to calculate the maximum product synthetic flux.3) Taking both the maximum flux and MDF values mentioned above as constraints, solving the pathway flux distribution through the pFBA algorithm.4) Solving the maximum MDF of each reaction in the pathway.5) Finally, the reaction where the maximum MDF is equal to MDF in step 1) is identified as the bottleneck reaction(s).

Identifying limiting metabolites
Limiting metabolites [37] are the metabolites that exert an influential impact on the thermodynamic level of bottleneck reactions, even with slight changes in concentration.Consequently, when a pathway reaches its thermodynamically optimal level, the concentration of limiting metabolite is no longer variable.This can be attributed to two reasons: 1) The concentration has reached the set upper or lower bound; 2) The metabolite plays roles as both substrate and product in different bottleneck reactions, resulting its concentration level being balanced.Given that limiting metabolites play a crucial role in bottleneck reactions, it becomes enough to focus on analyzing the variability in concentrations solely for the metabolites directly involved in bottleneck reactions.Ultimately, metabolites exhibiting no variability in concentration are classified as limiting metabolites.

Predicting key enzymes
Reactions catalyzed by key enzymes often have a larger flux control coefficient [38] and higher enzyme cost.Therefore, the minimum enzyme cost required for each reaction can be calculated based on the known reaction flux and MDF level.In enzyme constrained model, when the total enzyme amount imposes a constraint, the variability of the enzyme cost for a reaction in the pathway is often very small or may no longer exhibit variability.In contrast, if the pathway enzyme cost does not reach the total enzyme amount, the enzyme cost of reactions will show variability.

Prediction of the serine synthesis pathway using EcoETM
In our previous publication [19], we identified five distributed bottleneck reactions [37], PGCD, PGK_reverse, GAPD, FBA and TPI (see Table S1 of this manuscript and in attached table F of the cited reference), which led to the false prediction that the pathway in which they participate jointly is thermodynamically infeasible.Among them, PGCD is a well-known reaction in the L-serine synthesis pathway, and the other four reactions are also central to the glycolysis pathway.This subset is suitable as an example to illustrate the concept of distributed bottleneck reactions, but the conclusion that the classical EMP pathway cannot coexist with the conventional L-serine synthesis process is inconsistent with observations.Therefore, we analyzed the serine synthesis pathway in detail.The maximal thermodynamic driving force (MDF) of the L-serine synthesis pathways is variable, with 13 turning points, as shown in Fig. 1.The information of the bottleneck reaction(s) causing the MDF decline at each turning point is shown in Table S1, and the corresponding pathways at the turning points are detailed in Fig. S1.
The process of serine synthesis is divided into 13 stages according to the MDF levels.The MDF level in the first seven stages can reach more than 4 kJ/mol, and the thermodynamic feasibility is ideal.The MDF levels of the middle four stages are close to zero, and the feasibility is quite low.Finally, the MDF of the last two stages is negative, so it can be considered that they are thermodynamically infeasible.
Further analysis revealed that there are 2 localized and 11 distributed bottleneck reactions in total, representing the combination of reactions located in the central metabolic pathways (mainly in EMP, PPP and TCA) [39].For the yield indicator, the highest flux level that can be achieved at the end of the seventh stage is 8.99 mmol/gDW/h, which is only 42.7% compared with the highest flux of 21.03 mmol/gDW/h, and is unsatisfactory since it represents 57% carbon loss.
From the eighth stage, the MDF fell to 1.57 kJ/mol, which was caused by the localized bottleneck reaction of PGCD alone.Although the participation of this reaction reduced the MDF level of the pathway(s), it contributed significantly to the increase of yield, so that the synthetic flux of the pathway(s) reached 17.175 mmol/gDW/h, which was equivalent to 81.7% of the maximal yield.
From the ninth to the twelfth stage, the thermodynamic feasibility of the pathway(s) decreased gradually, which was caused by several distributed bottleneck reactions.The common point of these combinations of distributed bottleneck reactions is that the PGCD reaction is always sharing limiting metabolites with other bottleneck reactions, the overall thermodynamic feasibility of the pathway(s) is lower than in the eighth stage (when PGCD is used as a localized bottleneck reaction).According to the second law of thermodynamics (MDF ≥0), the flux limit of the last thermodynamically feasible pathway(s) at the end of the eleventh stage is 20.66 mmol/gDW/h, which is equivalent to 98.2% of the maximal yield.
At the thirteenth stage, there are still three distributed bottleneck reactions, but PGCD is not among them.By investigating the individual thermodynamic driving force levels of the three reactions, it was found that they are thermodynamically feasible in principle (in Table S2).However, due to the close association in sharing limiting metabolites, their simultaneous activity can make the pathway thermodynamically infeasible.This also shows that the thermodynamic feasibility based on the flux balance analysis (FBA) in GEMs with integrated multiconstraints is different from the simple preset reaction direction and reversibility.It should be noted that some anaerobic organisms, such as Desulfovibrio desulfuricans [40] and Clostridium drakei [41], adopt the POR5_r reaction (catalyzed by pyruvate synthase) to implement the reductive glycine pathway for autotrophic CO 2 fixation [42].In addition to POR5_r, FLDR2 (catalyzed by flavodoxin reductase) is the only reaction that can reduce the semi-oxidized flavodoxin in the iML1515 model.As a reaction that consumes reduced flavodoxin, POR5_r inevitably shares fluxes with FLDR2.In addition, FLDR2 also shares the b0684 and b2895 genes (fldA and fldB), encoding flavodoxin 1 and 2 respectively, with the POR5_r reaction.The coupling relationship between FLDR2 and POR5_r has been described by Maurice et al. [43].Recently, a similar coupling process has been used to construct an artificial minimal carbon fixation pathway, but the difference is only that several different ferredoxins were used as the electron donor instead of flavodoxin [44].Similar to the above two reactions, the PFL reaction is also only effective under anaerobic conditions [45], and requires the activation of flavodoxin reductase (FLDR2) [46].The three reactions can form a pathway under anaerobic conditions and the net reaction is the synthesis of formate from CO 2 , consuming a reducing equivalent provided by NADPH (Table S4).For E. coli, formate is the necessary source of reducing power in anaerobic deoxyribonucleotide synthesis [47].
Based on these results, it can be found that the PGCD reaction has a strong influence on the L-serine yield.With its participation, the maximum yield of pathway(s) can be increased from 42.7% to 98.2% of the theoretical yield predicted by the initial GEM, iML1515.According to the MetaCyc database, the PGCD reaction should indeed appear in the L-serine synthesis pathway of E. coli, which was proved by experimental results (in Fig. S2) [48,49].
In the combination of bottleneck reactions in stages 9-13 (Table S1), the distributed bottleneck reactions with or without PGCD are always located outside the core L-serine synthesis pathway (see Fig. S2 and Table S2).This indicates that the thermodynamic analysis and optimization of specific pathways alone may be inapplicable when they are placed in the cellular context, which reflects the differences between the solution for pathway MDF in the network [33] and the thermodynamic evaluation for specific pathways [50], indicating that the thermodynamic evaluation of preset pathways is likely to misdiagnose the actual situation in the complex intracellular system.

Learning the strategy from cells to overcome the thermodynamic bottleneck
The PGCD reaction is important for producing a high yield of L- serine, but its unfavorable thermodynamics are also a problem that must be solved.As shown in Fig. 2A, the PGCD reaction will release the reducing force of NADH.In 2017, Zhang et al. [51] proved that when PGCD is coupled with the reduction force consumption reaction catalyzed by the same enzyme expressed by the identical gene, there is a significant improvement of its thermodynamic feasibility.
In the iML1515 model, there are two potential natural coupling reactions with PGCD, AHGDx and ARHGDx (Fig. 2A).The above three reactions share the identical gene number (No. b2913), which therefore encodes a multifunctional enzyme.In the initial iML1515 model, the AHGDx and ARHGDx reactions were reversible, with the only difference between them being the substrate configuration.However, in the whole metabolic network, there is neither the configuration conversion process between their substrates, nor the consumption or silencing reaction with R-configuration substrate (R-2-hydroxyglutate, r2hglut), so ARHGDx is redundant and cannot participate in flux balance analysis.The Sconfiguration substrate of the AHGDx reaction (S-2-hydroxyglutarate, s2hglut) can only be used as the substrate of another irreversible reaction, SHGO, namely there are no other means to consume it.Therefore, it can be considered that the positive reaction direction of AHGDx in the model is also redundant, and only its reverse reaction is effective.The AHGDx_reverse reaction consumes reducing force, so when it is coupled with the SHGO reaction, the generation and consumption of carbonaceous metabolites can also be offset.This means that when the combined reaction continues to be coupled with PGCD, the thermodynamic feasibility of the overall reaction can be promoted by balancing the reducing force, and the metabolite 2-oxoglutarate (α-ketoglutarate, α-KG) can also be supplemented, as shown in Fig. 2A.
It is worth noting that the coupling of reactions is often used in enzyme activity detection [52], the construction of multi-enzyme system [53,54] and thermodynamic optimization of pathways [55,56], but the synthesis process of L-serine revealed by Zhang et al. [51] is fundamentally different.The main reason is that it is not only based on the coupling between reactions to change the metabolites concentrations in the cell, but there is also a correlation at the genetic level, so as to realize the reaction coupling within a protein structure, which offers a natural advantage for the transmission and capture of metabolites between reactions to strongly increase the thermodynamic driving force.This illustrates the significance of enzyme complexes and multifunctional enzymes, which can be considered the most basic level of cellular compartmentalization [57].
Then, a combined reaction catalyzed by SerA with a Δ r G i ′⁰ of 2.1 kJ/ mol was added to the model, and the equation was set as "3 pg_c + akg_c-> 3php_c + S2hglut_c", which was adopted to replace the original two partial reactions PGCD and AHGDx_ reverse.As the catalytic rate of an enzyme is constrained by the step with the weakest kinetic parameter, the lower k cat value of the AHGDx_reverse reaction was attributed to the combined reaction SerA, as illustrated in Fig. 2B.
In E. coli, the SerA forms a homotetramer, with each subunit possessing a complete set of substrate binding regions and catalytic functions.Studies have also shown that when the tetrameric structure is disrupted into dimers, catalytic activity decreases by 600-fold.These facts provide evidence for the compartmentalization of this enzyme [58][59][60].Indeed, catalyzing aKG not only enables NAD + replenishment but also helps prevent the reverse reaction, offering multiple forms of assistance in the transformation from 3 PG to PHP.As shown in Fig. 2 of the published work [60], the critical binding sites, R 60 and K 141 , for the substrates of the two partial reactions are closely located in spatial proximity to each other.
After the combination of reactions, the change of MDF in the L-serine synthesis process was recalculated (as shown in Fig. 2B).The pathway thermodynamic driving force was significantly improved.At the stage where the original thermodynamic feasibility reaches a good level (left side of the red dotted line), the achievable flux increased from 8.99 to 18.77 mmol/gDW/h.Moreover, after correction by merging reactions, the thermodynamic driving force in the whole yield space was optimized.Due to the double adjustment of reaction stoichiometric and enzyme kinetic parameters, the yield solution space was reduced by 9.1% (relative to the original thermodynamic feasible yield solution space, blue shaded area) to 10.7% (relative to the original total yield solution space, whole shaded area) as shown in Fig. 2B.In addition, the number of MDF levels was reduced and the pathway was only divided into five stages.See Fig. 2B and attached Table S3.
Because the POR5_r, FLDR2 and PFL reactions only work under anaerobic conditions, we closed the oxygen exchange reaction and combined the three reactions into one reaction named POFLx (Tables S4-6), and then predicted the L-serine synthesis pathway(s).As shown in Table S7 and Fig. S3, only three MDF levels were predicted, and the yield of the three pathways was generally low, generating byproducts such as acetate, ethanol, lactate and DHA.In previous studies, overflow metabolism under aerobic conditions was attributed to the limitation of enzyme resources, which did not reach the upper limit (<0.13 g enzyme/gDW) in the simulation results under anaerobic conditions.Therefore, the formation of fermentation products under anaerobic conditions is mainly caused by topological (for stages 1 and 2) and thermodynamic (for stage 3) reasons.Unsurprisingly, the absence of oxygen and the reactions it participates in greatly reduces the pathway solution space.
Among these three pathways, only the pathway of the third stage with the highest yield adopted the combined POFLx reaction, and the thermodynamic level of the pathway has not been improved at all.The MDF of the combined POFLx reaction was − 1.897 kJ/mol, the sum of the MDF values (− 0.632 kJ/mol) of the original three partial reactions.Then, we listed the optimal concentration levels of all metabolites involved in the three partial reactions (Table S8) and the combined reaction (Table S9).It can be seen that the intermediate metabolites avoided by the reactions combination are all within the concentration boundaries, while the NADPH/NADP (substrate, concentration ratio is limited to a narrow range), limiting metabolites CO 2 (substrate, low solubility) and formate (product) are all retained in the combined reaction.On the one hand, the thermodynamic feasibility of the partial reaction is overestimated due to the lack of a concentration ratio constraint of reduced and semi-oxidized flavodoxin in the model.Notably, it is especially easy to obtain this unfair advantage when the concentration constraint is relaxed in a multi-substrate reaction such as Fig. 2. Strategy for resolving the thermodynamic bottleneck in L-serine synthesis pathways.A. The biological phenomena and principles reproduced in this figure refer to the work performed by Zhang et al. [51].B. Thermodynamic feasibility analysis and comparison of L-serine synthesis processes.The result predicted by the initial model is indicated by the orange line, and the result predicted by models with combined reaction is indicated by the purple line.The left side of the red dotted line is the high thermodynamic feasibility range predicted by the initial model, while the shaded part represents the reduction of the yield space, while the blue part represents the space reduction of the thermodynamically feasible yield.

Reanalysis of L-tryptophan synthesis pathways
L-Tryptophan is a synthetic precursor of many bioactive components with medicinal value [63].As its essential precursor, the synthesis efficiency and feasibility of L-serine production should affect the performance of the L-Trp synthesis pathway.In previously published work, based on the unmodified EcoETM model of the L-serine synthesis process, the MDF levels of the L-tryptophan synthesis pathways were variable and always had acceptable thermodynamic feasibility (Fig. 3, the minimum value of MDF is 4.767 kJ/mol).This indicates that its pathway does not include PGCD reaction (the MDF is not greater than 1.57 kJ/mol when PGCD is involved).Hence, the recognized synthesis process of essential precursor L-serine was not adopted.Therefore, it was necessary to reanalyze the L-tryptophan synthesis process.
The initial MDF curve of the L-tryptophan synthesis pathway is shown in Fig. 3, and the analysis results of bottleneck reaction(s) at the turning point are shown in attached Table S10.
In Table S10, the MDF can be divided into ten levels.In the first level, the bottleneck reactions are the two incomplete semi-reactions ACONTa and ACONTb used to realize the production of isocitrate from citrate (via cis-aconitate) in the TCA cycle.Based on the understanding of the partial reaction process catalyzed within the same whole enzyme structure, although cis-aconitate is the product of the ACONTa reaction and the substrate of ACONTb, it is essentially different from the phenomenon of limited metabolite sharing between bottleneck reactions.Its generation and consumption are catalyzed by the same enzyme structure, which saves the time and energy costs for diffusion and capture, improving the affinity of the enzyme for the substrate molecule.By investigating the enzyme catalytic mechanism of aconitase, it can be found that the intermediate metabolite cis-aconitate normally does not dissociate from the active site, so that almost none is released into the cytoplasmic environment [64,65].Therefore, similar partial reactions should also be combined in the analysis process.
Most of the bottleneck reactions at stages 2-8 are found in central carbon metabolism, but there are also two additional high-frequency reactions, termed TRPS3 and TRPAS2_reverse.In the second stage, TRPS3 becomes one of the bottleneck reactions because it shares the metabolite G3P with the central metabolic reactions.As a direct synthesis reaction of L-tryptophan, TRPAS2_reverse often appears in pairs in the distributed bottleneck reactions from the third stage by sharing indole with the TRPS3 reaction.Contrary to expectations, L-serine is not required for TRPAS2_reverse to synthesize L-tryptophan, which directly leads to the absence of L-serine from the L-tryptophan synthesis pathway.Subsequently, we listed all the reactions in the iML1515 model that may generate L-tryptophan directly, as shown in Table 1.
All four reactions in Table 1 had acceptable thermodynamic parameters, as illustrated by the Δ r G i ′⁰ ⁰ values.Among them, both TRPS1 and TRPS2 use L-serine as substrate and share the identical gene ID.
Based on the literature [66], it can be confirmed that TRPS1 is an overall reaction catalyzed by whole tryptophan synthase, while TRPS2 is a partial reaction, which constitutes TRPS1 with the additional partial reaction TRPS3 (b1260 and b1261; 3ig3p_c-> g3p_c + indole_c) corresponding to the same genes in the model.Therefore, reaction TRPS1 should be retained, while the partial reactions TRPS3 and TRPS2, respectively catalyzed by the α and β subunits [65], should be closed.
After this adjustment, TRPAS2_reverse will be silenced by the shutdown of indole generation reaction TRPS3.According to the literature, the indole produced by the TRPS3 reaction will be transmitted to TRPS2 through a channel inside the enzyme [66], so that TRPAS2_reverse cannot normally capture the substrate indole.When driven by a direct supply of highly concentrated ammonia, TRPAS2_reverse can be used for L-tryptophan synthesis [67], but its major function is L-tryptophan decomposition [68].Furthermore, considering that the transmission of intermediate metabolites inside the enzyme molecule has overwhelming natural advantages, TRPAS2_reverse is inevitably eclipsed by TRPS1 [69].
In addition, the MTRPOX reaction does not have the ability to synthesize L-tryptophan in the model, and its substrate N-methyltryptophan (nmtrp_c) only participates in this one reaction, that is, it can only be consumed and not regenerated, so that the reaction does not participate in the FBA calculation process in most cases (except when the substrate N-methyltryptophan is directly supplied).Accordingly, it is redundant in the model and is also not a reasonable candidate for L-tryptophan synthesis.
Returning to the analysis of MDF levels, since all the bottleneck reactions at stages 9-10 are central metabolic reactions, and the L-tryp- tophan synthesis process is abnormal due to improper representation of stoichiometric equations, the thermodynamic MDF levels of L-trypto- phan pathway(s) should be redrawn after adjusting the stoichiometric framework based on the above analysis.The adjusted reactions are listed in Table 2.
The combination of the two groups of reactions is shown in Table 2, in which the reactions group of ACONT is reversible.The MDF information of the corrected L-tryptophan synthesis process is shown in Table S11.Overall, while both the number of steps and the MDF values increased, the maximal flux decreases from 4.13 to 3.27 g/gDW/h, representing an obvious yield reduction as shown in Fig. 3.
Kishore et al. published a quantitative and systematic study of the apparent thermodynamic difference between the overall biochemical

Table 1
All reactions for direct L-Trp formation in the iML1515 model.

No.
Reaction ID Reaction equation Gene ID X. Yang et al. reaction and the separate reactions [70] in the L-Trp synthesis process.
To test the rationale of reaction adjustment, we visualized the L-tryp- tophan synthesis pathway as shown in Fig. 4. The pathway structure is consistent with both the work of Kishore et al. [70] and the E. coli L-tryptophan synthesis pathway in the MetaCyc database.The synthesis pathway of L-tryptophan has three main confluence points, respectively corresponding to the ANS, ANPRT and TRPS1 reactions.For easy comparison, the pathways predicted by the iML1515 model and the original EcoETM are shown in Figs.S4 and S5, respectively.It can be seen that both pathways mistakenly use pyruvate and ammonia as direct substrates rather than L-serine.Therefore, although the introduction of enzymatic and thermodynamic constraints can reduce the yield solution space and reproduce the acetate overflow metabolism, the model is still not sufficiently realistic to avoid the prediction error of pathway structure and targets.

Discussion
Incomplete reactions in GEMs lead to unrealistic opportunities to participate in the pathway alone and increase the number of reaction combinations, which will expand the solution space of synthetic pathways.However, the separation of linked reactions will cause the exposure of intermediate metabolites and strengthen the constraints, which is unfavorable to the solution space of thermodynamic feasibility [50].Conversely, when the partial reactions are combined, the pathway yield may be partially lost due to limitations of thermodynamic feasibility.
The synthesis examples of L-serine and L-tryptophan illustrate the trade-off between pathway yield and thermodynamic driving force.In previous studies on thermodynamics, due to the lack of Gibbs free energy parameters of some reactions, the exposure of intermediate metabolites with yet unquantifiable energy level was reduced by combining reactions [71,72].Similarly, during the determination of enzyme kinetic parameters, the exposure of undetectable intermediate metabolites can also be avoided by using more easily detected and more stable end  products as indicators by coupling reactions [56,73].In the process of network analysis, the combination of reactions can also avoid the problem of combinatorial explosion, making it computationally feasible to analyze the main network topology [74].Thus, combined chemical reactions can provide feasibility and convenience for the study of several problems.In this work, the same small adjustment of combining reactions helped us compare overall reactions and partial reactions.The trade-off between yield and MDF, as depicted in Fig. 3, carries a dual significance: 1) it exists within each curve, and 2) it is evident when comparing two curves.The fundamental reason is identical for both the combination of reactions and the setting of high yield essentially excluding the possibility of some reaction combinations (i.e., pathways).Accordingly, both have the function of reducing the solution space.However, the combination of reactions is beneficial to the improvement of the thermodynamic driving force and unfavorable to the maintenance of the yield.Conversely, a high yield level setting is beneficial to maintain the yield, but unfavorable to maintain the thermodynamic driving force level.Although the reported maximum yield of L-serine synthesis is currently only less than 50% [75,76], this article predicts that there is still considerable potential for improvement within the feasible thermodynamic range.This suggests that the low yield is not a result of theoretical constraints, but rather can be enhanced by employing better regulation and fermentation methods.
Multifunctional enzymes or enzyme complexes can also be considered as cell compartmentalization strategies, which is of great significance for the feasibility of metabolic processes.This concept should be fully referenced in both the construction and application of constraintsbased GEMs.At present, the enzyme-constrained models only consider the effects of multifunctional enzymes and enzyme complexes in terms of the molecular weight of proteins [13,14], but underestimate the biologically meaningfully evolved efficiency of complex enzyme molecules such as pyruvate dehydrogenase (PDH) and 2-ketoglutarate dehydrogenase (AKGDH) [16], resulting in the overestimation of enzyme costs, which distorts the predicted pathway structure (the truly optimal pathway is replaced by the sub-optimal pathway due to the estimated deviation of enzyme cost) and/or flux (even if there is no false switching of pathways, the flux will be reduced due to excessive resource constraints), and makes pathway evaluation [77,78] unreliable.In addition, if the microscopic compartmentalization concept is reasonably adopted, more engineering possibilities may also be realized.For example, the enzyme complex encoded by the fadAB genes in the fatty acid β-oxidation pathway [79] can be disassembled, so that the intermediate metabolite β-ketoacyl-CoA can be released and then integrated into a variety of bio-polyesters [80].In addition, the internal structure of the enzyme can not only enable compartmentalization, but the undisturbed microenvironment can provide conditions for the local accumulation of metabolites [28,81].Therefore, although there is no structural coupling relationship between some enzyme functions such as internal channels, there may nevertheless promote the thermodynamic feasibility, such as reactions catalyzed by the "fused diaminohydroxyphosphoribosylaminopyrimidine deaminase and 5-amino-6-(5-phosphoribosylamino)uracil reductase (RibD)" [82] and "fused N-acetylglucosamine 1-phosphate uridyltransferase and glucosamine 1-phosphate acetyltransferase (GlmU)" [83] expressed by the genes b0414 and b3730, respectively.
Thus, it is important to set the parameters in the process of model correction and analysis based on the real intracellular conditions to obtain effective prediction information.At the same time, it is meaningful to investigate the structure and catalytic mechanism of enzymes [84,85], as typical molecular machines [86,87], to help us understand the intermediate metabolite(s) that mediate the relationship between reactions from multiple aspects.These include, 1) whether they will separate from enzyme molecule, 2) whether they will escape from the enzyme structure and the escape efficiency, as well as 3), whether the environment facilitates their diffusion, which will provide an important basis for estimation of the degree of thermodynamic promotion between reactions.In this manuscript, based on the known information that intermediate metabolite cis-aconitate is difficult to leak from the enzyme structure in E. coli, we consider the two partial reactions catalyzed by aconitase as an overall reaction.As evidence, the yield of itaconate synthesized from cis-aconitate in E. coli is much lower than that in its natural production strain, Aspergillus terreus [88].This is due to the efficient transportation of cis-aconitate produced by the TCA cycle from mitochondria to the cytoplasm in Aspergillus terreus, which leads to high production of itaconate.In contrast, E. coli lacks the compartmentalization necessary to maintain membrane structure.Last year, Ye et al. [89] developed a mutant 2-methylcitrate dehydratase (PrpD) to synthase the cis-aconitate, which is orthogonalization with the aconitase.Their work fully recognizes that the structure of enzymes constitutes compartmentalization, which is consistent with the core idea of this manuscript.Therefore, it can provide vivid support for the rationality of the overall reaction and processing methods in our work, and also demonstrate that compartmentalization and decompartmentalization are very valuable strategies.It can be said that trade-offs are ubiquitous.The compartmentalization of aconitase in E. coli improves the efficiency of TCA cycle, and also compresses the production capacity of cis-aconitate and its derivatives.Accordingly, improved in-depth analysis of protein structure and function is crucial for the simulation and prediction of cell behaviors [90].
In addition, the bottleneck reactions predicted in this study often include reactions in the central metabolic pathways, which may help us understand why there are oscillations in central metabolic processes such as glycolysis [91][92][93].All biochemical reactions in organisms cannot occur simultaneously due to constraints of thermodynamic feasibility and resource availability, just as all trains in a country cannot run simultaneously.Therefore, oscillations provide overall planning and coordination for the inner workings of the cellular system.This seems to be contrary to the theoretical basis of GEMs, which are based on the steady-state hypothesis and flux balance analysis [94], but just as computers will not operate in the same way as the human brain, this difference can be understood and accepted, so that nonequilibrium theory and the steady-state hypothesis have been and will continue to coexist and guide our reasoning [95,96].
The intelligence of cells is beyond our imagination, and how to reproduce cell space-time in digital cells will also be an interesting challenge, whereby whole-cell modeling [97] arguably represents the most hopeful vision.In addition to compartmentalization, cells may also obtain additional energy at high temperature, which can enable processes that are infeasible at room temperature, such as in some archaeal and thermophilic microorganisms [98].Perhaps we should be more optimistic in our assessment and verification of pathway feasibility [99].

CRediT authorship contribution statement
Xue Yang: conceived the project, developed the ETGEMs and constructed the EcoETM model, performed the calculation, Visualization, wrote the manuscript.Zhitao Mao: developed the ETGEMs and constructed the EcoETM model, maintains the calculation tool.Jianfeng Huang: contributed to the analysis and discussion.Ruoyu Wang: maintains the calculation tool, All authors read and approved the manuscript.Huaming Dong: contributed to the analysis and discussion.Yanfei Zhang: contributed to the analysis and discussion, wrote the manuscript.Hongwu Ma: conceived the project, wrote the manuscript.

Declaration of competing interest
The authors declare no competing financial interests.

Fig. 1 .
Fig. 1.The maximal thermodynamic driving force (MDF) of the serine synthesis pathway.The horizontal coordinate value indicates the rate of L-serine synthesis, the grey value at the turning point indicates the MDF level of pathway(s).The upper bound of the glucose uptake rate was set to 10 mmol/g DW/h.

Fig. 3 .
Fig. 3.The MDF levels of the L-Trp synthesis pathways.Comparison of the MDF levels comparison of L-Trp pathways before (blue color) and after correction (orange color).The horizontal coordinate value indicates the L-Trp synthesis rate, and the grey value at the turning point indicates the MDF level of pathway(s).The upper bound of the glucose uptake rate was set to 10 mmol/ gDW/h.

Fig. 4 .
Fig. 4. Predicted pathway of L-Trp synthesis.The revised reactions are indicated by red arrows, and the metabolites at the end of sub-pathway branches are indicated by a green background.The unit of the flux value is mmol/gDW/h (blue, on top), the unit of the enzyme cost is mg/gDW (purple, in the middle), and the unit of the maximal thermodynamic driving force is kJ/mol (orange, at the bottom).The upper limit of the glucose uptake rate was 10 mmol/gDW/h.

Table 2
Revised reactions in L-Trp synthesis process analysis.